Using feature structures as a unifying representation format for corpora exploration

نویسندگان

  • Julien Nioche
  • Benoît Habert
چکیده

In this paper we report on the use of feature structures to represent the linguistic information of a corpus. This approach has been adopted in TyPTex, a project which aims at providing a generic architecture for corpora profiling. After a brief overview of the Typtex project, we show that corpora exploration requires manipulating linguistic features in order to obtain a required level of linguistic information or changing the set of features to get a new point of view on the data. We show that feature structures formalism can help the building and management of linguistic features with Meta-Rules based on unification. Finally, we provide an example of marking which uses a mixed approach between projection of information from a static lexicon and contextual marking via Meta-Rules. Results tend to show that the use of feature structures can improve the coverage and reliability of the marking.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

SusTEInability of linguistic resources through feature structures

This article shows that the TEI tag set for feature structures can be adopted to represent a heterogeneous set of linguistic corpora. The majority of corpora is annotated using markup languages that are based on the Annotation Graph framework, the upcoming Linguistic Annotation Format ISO standard, or according to tag sets defined by or based upon the TEI guidelines. A unified representation co...

متن کامل

OPTIMIZATION OF AN OFFSHORE JACKET-TYPE STRUCTURE USING META-HEURISTIC ALGORITHMS

Offshore jacket-type towers are steel structures designed and constructed in marine environments for various purposes such as oil exploration and exploitation units, oceanographic research, and undersea testing. In this paper a newly developed meta-heuristic algorithm, namely Cyclical Parthenogenesis Algorithm (CPA), is utilized for sizing optimization of a jacket-type offshore structure. The a...

متن کامل

An XML-based Representation Format for Syntactically Annotated Corpora

This paper discusses a general approach to the description and encoding of linguistic corpora annotated with hierarchically structured syntactic information. A general format can be motivated by the variety and incompatibility of existing annotation formats. By using XML as a representation format the theoretical and technical problems encountered can be overcome.

متن کامل

A New Approach towards Precise Planar Feature Characterization Using Image Analysis of FMI Image: Case Study of Gachsaran Oil Field Well No. 245, South West of Iran

Formation micro imager (FMI) can directly reflect changes of wall stratums and rock structures. Conventionally, FMI images mainly are analyzed with manual processing, which is extremely inefficient and incurs a heavy workload for experts. Iranian reservoirs are mainly carbonate reservoirs, in which the fractures have an important effect on permeability and petroleum production. In this paper, a...

متن کامل

Towards A Modular Data Model For Multi-Layer Annotated Corpora

In this paper we discuss the current methods in the representation of corpora annotated at multiple levels of linguistic organization (so-called multi-level or multi-layer corpora). Taking five approaches which are representative of the current practice in this area, we discuss the commonalities and differences between them focusing on the underlying data models. The goal of the paper is to ide...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001